/*__________________________________________________________________________________

Replication files and instructions: "Beauty, Job Tasks, and Wages: A New Conclusion
                                     about  Employer Taste-Based Discrimination"

___________________________________________________________________________________*/


1) FILES:

  1a) rep_beauty.do: STATA do-file. Produces all empirical results in the paper.
  1b) berea_beauty_data.dta: STATA formatted dataset (variables listed below)
      note: data is confidential: see instructions below for access
  1c) histogram.py (python code): creates the histograms in Figure 1
  1d) gen_bw_hist.ipynb (jupyter notebook): reads in attractiveness rating data (ratings_by_person.dta),
      call histogram.py to create the histogram in Figure 1
  1e) ratings_by_person.dta: a dataset that only contains a person identifier, gender, and
      attractiveness rating (described below)

2) INSTRUCTIONS:

  2a) All results in paper (except Figure 1): run STATA do-file rep_beauty.do. Important:
      before running, create a directory "tables" where the do-file is saved. This do-file
      generates all results, in the order presented in the paper. Latex tables are written
      to the "tables" directory.

  2b) Figure 1: Important: before running, create a directory "figures" where the notebook is saved.
      Run the jupyter notebook gen_bw_hist.ipynb. This notebook reads in the rating and gender data
      in ratings_by_person.dta, and creates the two sub-figures in Figure 1: histograms for attractiveness
      for men and women. Figures are written to the "figures" directory.

  2c) Instructions for obtaining access to the data:

      The respondent confidentiality agreement for the project does not allow the data to be made publicly
      available. However, the data can be made available for replication purposes in a confidential data
      center at the University of Western Ontario. To inquire, contact Todd Stinebrickner at trstineb@uwo.ca.

  2d) Software versions
      STATA 12; python 2.7.13; jupyter notebook: jupyterlab 0.32.1; operating system: linux Mint 18


3) VARIABLE DEFINITIONS

3a) berea_beauty_data.dta:

Identifiers:
  id: person identifier
  time: time identifier

logrw1 = log real wage
std_rating = standardized attractivenes rating
std_coll_gpa =  standardized college GPA
age = age in years
hs_gpa = high school GPA
faminc = family income
female = binary variable for female

The task fraction of time variables (each between zero and one)
  i_high = fraction of time on high skilled information
  i_low  = fraction of time on low skilled information
  p_high = fraction of time on high skilled people
  p_low  = fraction of time on low skilled people

Binary variables identifying task specialization:
  spec_p_high_no_o = 1 if specialize in p_high; 0 otherwise
  spec_p_low_no_o  = 1 if specialize in p_low; 0 otherwise
  spec_i_high_no_o = 1 if specialize in i_high; 0 otherwise
  spec_i_low_no_o  = 1 if specialize in i_low; 0 otherwise

binary variables for female attractiveness quartiles (coded missing for men):
  fem_r_quart1 = 1 if female w/ attractiveness in 1st quartile; 0 else
  fem_r_quart2 = 1 if female w/ attractiveness in 2nd quartile; 0 else
  fem_r_quart3 = 1 if female w/ attractiveness in 3rd quartile; 0 else
  fem_r_quart4 = 1 if female w/ attractiveness in 4th quartile; 0 else

self rated skills: binary variables for "rate self in top 25%"

  char_com_self_high = 1 if rate own communication skills top 25%; 0 otherwise
  char_per_self_high = 1 if rate own personality top 25%; 0 otherwise
  char_rel_self_high = 1 if rate own relateability top 25%; 0 otherwise

variables valid only in t=1 (missing otherwise). These variables are used to compute
descriptive statistics for the non-time varying demographics in Table 1

  i_rating   = person i's attractiveness rating (average of 50 evaluators: rated on 5 point scale)
  i_coll_gpa = college GPA (4 point scale)
  i_hs_gpa = high school GPA (4 point scale)
  i_faminc = family income

3b) ratings_by_person.dta

  id = person identifier
  rating = attractiveness rating (average of 50 evaluators: rated on 5 point scale) (identical to i_rating)
  female = binary variable for female
